Method for separating a sound frame into sinusoidal components and residual noise

ABSTRACT

This invention relates to a method of determining ( 10 ) a second sound frame ( 20 ) representing sinusoidal components and an optionally third sound frame ( 30 ) representing a residual from a provided first sound frame, the method includes the steps of: determining a sinusoidal component in the first sound frame among non extracted components; determining an importance measure ( 40 ) for the first sound frame; extracting the sinusoidal component from the first sound frame, and incorporating the sinusoidal component in the second sound frame; and repeating said steps until the importance measure fulfils a stop criterion ( 50 ). In the method, the step of determining an importance measure for the first sound frame can be executed before said third step or it can be executed between said third and fourth step. Said method further includes the step of: setting the third sound frame to the first sound frame, when the importance measure fulfils said stop criterion. This enables for that only necessarily sinusoidal components are extracted for use in a subsequent compression.

This invention relates to a method of determining a second sound framerepresenting sinusoidal components and an optionally third sound framerepresenting a residual from a provided first sound frame.

The present invention also relates to a computer system for performingthe method.

The present invention further relates to a computer program product forperforming the method.

Additionally, the present invention relates to an arrangement comprisingmeans for carrying out the steps of said method.

U.S. Pat. No. 6,298,322 discloses an encoding and synthesis of tonalaudio signals using a dominant and a vector-quantized residual tonalsignal. The encoder determines time-varying frequencies, amplitudes, andphases for a restricted number of dominant sinusoid components of thetonal audio signal to form a dominant sinusoid parameter sequence. These(dominant) components are removed from the tonal audio signal to form aresidual tonal signal. Said residual tonal signal is encoded using aso-called residual tonal signal encoder (RTSE).

It is common knowledge and knowledge in the above mentioned prior artthat in sinusoidal plus residual coding of an audio signal, the audiosignal is segmented and each frame is modelled by a sinusoidal part plusa residual part. The sinusoidal part will typically be a sum ofsinusoidal components. In most sinusoidal coders the residual is assumedto be a stochastic signal, and can be modelled by noise. When this isthe case, the sinusoidal part of the signal should account for all thedeterministic (i.e. tonal) components of the original frame.

If the sinusoidal part does not account for all tonal components, sometonal components will be modelled by noise. Because noise is notsuitable to model tones, this may introduce artefacts. If the sinusoidalpart accounts for more than the deterministic part, sinusoidalcomponents are modelling noise. This is not desirable for two reasons.On the one hand, sinusoids are not suitable to model a noisy signal andartefacts can appear. On the other hand, if these components weremodelled by noise, more compression would be achieved.

The state of the art suggests some methods to deal with this issue,i.e., how to obtain a good separation into the sinusoidal and theresidual part.

-   S. N. Levine. Audio Representation for Data Compression and    Compressed Domain Processing. Ph.D. Dissertation, Stanford    University, 1998.-   S. N. Levine, J. O. Smith, “Improvements to the switched parametric    & transform audio coder”, in Proc. 1999 IEEE on Applications of    Signal Processing to Audio and Acoustics, 1999, pp. 43-46.-   S. N. Levine, and J. O. Smith III, “Improvements to the switched    parametric & transform audio coder,” in Proc. 1999 IEEE Workshop on    Applications of Signal Processing to Audio and Acoustics, New Paltz,    New York, Oct. 17-20, 1999, pp. 43-46.-   G. Peeters, and X. Rodet, “Signal Characterisation in terms of    Sinusoidal and Non-Sinusoidal Components,” in Proc. Digital Audio    Effects, Barcelona, Spain, 19-21 November 1998.-   X. Rodet, “Musical Sound Signal Analysis/Synthesis:    Sinusoidal+Residual and Elementary Waveform Models,” in Proc. IEEE    Time-Frequency and Time-Scale Workshop (TFTS '97), University of    Warwick, Coventry, UK, 27th-29th Aug. 1997.

Some methods are fully based on the signal properties.

-   G. Peeters, and X. Rodet, “Signal Characterisation in terms of    Sinusoidal and Non-Sinusoidal Components,” in Proc. Digital Audio    Effects, Barcelona, Spain, November 1998.-   X. Rodet, “Muscial Sound Signal Analysis/Synthesis:    Sinusoidal+Residual and Elementary Waveform Models,” in Proc. IEEE    Time-Frequency and Time-Scale Workshop (TFTS '97), University of    Warwick, Coventry, UK, 27th-29th Aug. 1997.

Others are more based on psychoacoustical considerations.

-   S. N. Levine. Audio Representation for Data Compression and    Compressed Domain Processing. Ph.D. Dissertation, Stanford    University, 1998.-   S. N. Levine, J. O. Smith, “Improvements to the switched parametric    & transform audio coder”, in Proc. 1999 IEEE on Applications of    Signal Processing to Audio and Acoustics, 1999, pp. 43-46.-   S. N. Levine, and J. O. Smith 11, “Improvements to the switched    parametric & transform audio coder,” in Proc. 1999 IEEE Workshop on    Applications of Signal Processing to Audio and Acoustics, New Paltz,    New York, Oct. 17-20, 1999, pp. 43-46.

Unfortunately, it is not easy to make the separation into the sinusoidaland the residual part and none of these methods give fullysatisfactorily results [see, e.g., G. Peeters, and X. Rodet, “SignalCharacterisation in terms of Sinusoidal and Non-Sinusoidal Components,”in Proc. Digital Audio Effects, Barcelona, Spain, November 1998]. It istherefore an object of the current invention to have a good separationamong the deterministic and the stochastic parts of an input signal inorder to avoid artifacts and in order to achieve—in a subsequentcompression of the separated signals—an optimal and efficientcompression or coding.

Said object is achieved, when the method mentioned in the openingparagraph comprises the steps of:

-   -   determining a sinusoidal component in the first sound frame        among non extracted components;    -   determining an importance measure for the first sound frame;    -   extracting the sinusoidal component from the first sound frame,        and incorporating the sinusoidal component in the second sound        frame; and    -   repeating said steps until the importance measure fulfils a stop        criterion;

The said method has a number of advantages above existing methods. Theextra complexity introduced to the coding stage is almost zero.Moreover, the complexity may even be lowered, because the methodindicates—in the last step—when to stop extracting sinusoidalcomponents. As a result, no more sinusoids than necessary are extractedin the third step. In addition, psychoacoustic considerations are easilyincorporated. Most importantly, the method gives a goodstochastic-deterministic balance, taking into account the nature of theinput frame, i.e. the nature of said first sound frame.

In a preferred embodiment of the invention, the second step (ofdetermining an importance measure) can be executed before the thirdstep, or can be executed between the third and fourth step.

In a preferred embodiment of the invention, the method further comprisesthe step of:

-   -   setting the third sound frame to the first sound frame, when the        importance measure fulfils said stop criterion.

Hereby, it is achieved also to provide the residual (i.e. the thirdsound frame) as an input to a subsequent compression of the separatedsignals, (i.e. the second and third sound frames).

In a preferred embodiment of the invention, said step of extracting thesinusoidal component from the first sound frame, and incorporating thesinusoidal component in the second sound frame further comprises thestep of:

-   -   removing the sinusoidal component from the first sound frame.

It is hereby an advantage that subsequent determination of sinusoidalcomponents and/or importance measure may be more accurate.

Further alternative embodiments of the invention are reflected in claim4 through 10.

The invention will be explained more fully below in connection withpreferred embodiments and with reference to the drawings, in which:

FIG. 1 shows an embodiment of the invention, where a stopping criterionindicates when to stop extracting sinusoidal components in thesinusoidal analysis stage, an extracted component which is introducedinto a sinusoidal model and a residual signal;

FIG. 2 shows the results of this method for a piece of music (upperpanel). The number of sinusoids spent in each frame is indicated in thelower panel;

FIG. 3 shows a method of determining a second sound frame representingsinusoidal components and an optionally third sound frame representing aresidual from a provided first sound frame; and

FIG. 4 shows an arrangement for sound processing.

Throughout the drawings, the same reference numerals indicate similar orcorresponding features, functions, sound frames, etc.

FIG. 1 shows the introduction of the stopping criterion in thesinusoidal extraction and how an input frame is separated into twodifferent signals: an extracted sinusoidal component which is introducedinto a sinusoidal model and a residual signal.

The figure shows an embodiment of the invention, where a low complexitypsychoacoustic energy-based stopping criterion is applied in saidseparation. The figure shows the diagram of blocks of the system. Theinput frame, reference numeral 10, is input to an extraction method. Theextraction method extracts one sinusoidal component in each iteration.After each extraction, two different signals are obtained: the extractedcomponent, which is introduced, i.e. added or appended, into thesinusoidal model, reference numeral 20, and the residual signal,reference numeral 30. Then a psychoacoustic measure or anenergy-measure—which will generally and commonly be called importancemeasure, reference numeral 40 is calculated from the residual signal.From the information provided by said measure, a decision—based on astop criterion as indicated in reference numeral 50—is made whetherthere are probably still some important tonal components in it or not.In the last case, the extraction method must be stopped and vice versa.

The measure that gives this information is called Detectability of theresidual signal and the Detectability reduction. The Detectabilitymeasure is based on the Detectability of the psychoacoustic modelpresented in S. van de Par, A. Kohlrausch, M. Charestan, R. Heusdens, “Anew psychoacoustical masking model for audio coding applications,” inProc. IEEE Int. Conf. Acoust., Speech and Signal Process., Orlando, USA,May 13-17, 2002.

The value of the Detectability of the residual indicates how muchpsychoacoustic relevant power is still left in the residual. If itreaches one or a lower value at iteration m, it means that the energyleft is inaudible. The detectability reduction indicates how muchrelevant power has been reduced after one extraction with respect to thepower remaining before the extraction. The block ‘importance measurecalculation’, reference numeral 40, may compute the Detectability of theresidual and its reduction according to the equations: $\begin{matrix}{{D_{m} = {{\sum\limits_{f}{{R_{m}(f)}{a(f)}}} = {\sum\limits_{f}\frac{R_{m}(f)}{{msk}(f)}}}}\begin{matrix}{{{reduction}_{Dm}(m)} = {100 - {\frac{100*D_{m}}{D_{m - 1}}(\%)}}} \\{= {100\left( {1 - \frac{D_{m}}{D_{m - 1}}} \right)}} \\{= {100\left( \frac{\Delta\quad D}{D_{m - 1}} \right)}}\end{matrix}} & (1)\end{matrix}$where R_(m)f represents the power spectrum of the residual signal, a(f)the inverse function of msk(f) that is the masking threshold of theinput signal (computed in power), f the frequency bins, m the iterationnumber and AD the decrement of Detectability.

The Detectability indicates whether the energy left is audible, and thevalue of its reduction gives an indication how to differentiate amongthe deterministic and the stochastic part of the input frame. The reasonis that detectability is usually reduced more when the extracted peak isa tonal component than when it is a noisy component. Then, theextraction algorithm should stop extracting components when either thevalue of Detectability is equal to or lower than one, or when itsreduction reaches a certain value (assumed to correspond to values ofreduction when noisy components are extracted).

It may be noted that the introduced measure should only be combined witha psychoacoustic extraction method, for example psychoacousticalmatching pursuit presented in R. Heusdens and S. van de Par (2001),“Rate-distortion optimal sinusoidal modelling of audio and speech usingpsychoacoustical matching pursuits,” in Proc. IEEE Int. Conf. Acoust.,Speech and Signal Process., Orlando, USA, May 13-17, 2002. The reason isthat if the extraction method does not use psychoacoustics, the measurecan give a poor indication. For instance, if the extraction method is anenergy-based extraction method without psychoacoustic considerations(like ordinary matching pursuit), the peak that most reduces the energywill be subtracted at each iteration. If this is the case, the energyreduction may be high, while the Detectability reduction may be low ifthe peak is not psychoacoustically important. As a result, theextraction method would be stopped, whereas perceptually-relevant tonalcomponents may still be left in the signal. Then, if the extractionmethod used does not include psychoacoustics, a variant on the stoppingcriterion is recommended. In this case, it is recommended to use Energyreduction as an indicator for the deterministic-stochastic balanceinstead of Detectability reduction.

Unlike the previously mentioned solutions, this solution makes thedecision during the extraction. Therefore, the only thing thatintroduces complexity to the system is the computation of the measure ateach iteration, m. However, if the method is combined with apsychoacoustic extraction method, the complexity introduced isnegligible, as the masking threshold is already computed by theextraction method.

As an alternative to said measures, i.e. the psychoacoustic measure andthe energy-measure as importance measure—discussed so far—other,alternative measures may be considered as the importance measure.

Said psycho-acoustics is another word for auditory perception (=theresponse of the human auditory system to sound). In the psycho-acousticmeasure the human response is taken into account. Thus, thepsycho-acoustic measure is an example of an importance measure thatincorporates the human response to sound. However, this is a specificembodiment. Of course, it is also possible to make more advancedimplementations of auditory perception. In addition, also importancemeasures without taken into account the human response to sound areuseful. An example of such an importance measure is the mentioned energymeasure. FIG. 2 shows the results for the stopping criterion applied toa piece of music (upper panel). The number of sinusoids spent in eachframe is indicated in the lower panel.

In order to check the usability of the measure to differentiate amongthe stochastic and the deterministic part of the (input) signal, thestopping criterion of reference numeral 50 was implemented in asinusoidal coder and tested. The chosen coder was the SiCAS coder(Sinusoidal Coding of Audio and Speech). In its default situation, afixed number of peaks are extracted at each frame.

The extraction method used is psychoacoustical matching pursuitpresented in R. Heusdens and S. van de Par (2001), “Rate-distortionoptimal sinusoidal modelling of audio and speech using psychoacousticalmatching pursuits,” in Proc. IEEE Int. Conf. Acoust., Speech and SignalProcess., Orlando, USA, May 13-17, 2002.

At each iteration, it extracts the most psychoacoustically relevantpeak, according to the masking threshold of the input signal. Therefore,the masking threshold in expression (1) does not need to be computed, asit is already computed by the extraction method.

he threshold value of reduction was not set to one unique value.Instead, a range of values was chosen (from 3.5 up 5.5 in steps of0.25). Then, a group of speech and one audio signal were coded usingeach of these values. The same signals were also coded with a fixednumber of sinusoids per frame (from 12 up to 20) in order to compareboth situations.

Informal listening experiments derive the results that are explained inthe next section.

To compare the two different situations (with stopping criterionaccording to the invention and with fixed number of sinusoids) a pair ofcoded-decoded signals is chosen such that their quality is the same.Then, two results are obtained. Firstly, when using the stoppingcriterion the allocation of sinusoids is better than in the case when afixed number (of sinusoids) per frame is extracted. In other words, theallocation of sinusoids gives a better deterministic-stochastic balance.The figure shows how the sinusoids are allocated in one piece of a codedexemplary song, randomly chosen. The tendency that can be seen in thefigure is that a higher number of sinusoids are spent where the (input)signal is more harmonic, i.e. in the voiced part in the middle than whenit is more noisy, i.e. in the unvoiced parts at the beginning and end.

This better allocation of sinusoids can easily be noticed by listeningto the sinusoidal part of the coded signal. Then the voiced parts areclearly audible (so modelled), while the unvoiced part cannot be heard(because they are not modelled by the sinusoidal model).

econdly, the total number of sinusoids used in the whole peace of musicis usually reduced and as a result, the bit rate.

When—throughout this application the wording “sound” is mentioned—it isintended to designate human speech, audio, music, tonal and non-tonalcomponents, or coloured and non-coloured noise in any combination, andit may be applied as input to said extraction method and it may also beapplied to the method discussed in the following.

FIG. 3 shows a method of determining a second sound frame representingsinusoidal components and an optionally third sound frame representing aresidual from a provided first sound frame.

The first sound frame corresponds to the previously mentioned inputsignal and represents sinusoidals and a residual, the second sound framerepresents sinusoidals and the third sound frame represents theresidual. The second and third sound frames may initially be empty ormay contain content from applying of this method on a previous (first)sound frame.

In step 90, the method is started in accordance with shown embodimentsof the invention. Variables, flags, buffers, etc., keeping track ofinput (first) and outputs (second and third) sound frames, components,importance measures, etc, corresponding to the sound signals beingprocessed are initialised or set to default values. When the method isiterated a second time, only corrupted variables, flags, buffers, etc,are reset to default values.

In step 100, a sinusoidal component in the first sound frame may bedetermined. Typically said component will represent some important soundinformation, i.e. it primarily comprises tonal, non-noisy information.

The simplest determination technique (for said component determination)consists of picking the most prominent peaks in the spectrum of theinput signal, i.e. of the first sound frame. The original audio signalis multiplied by an analysis window and a Fast Fourier Transformation iscomputed for each frame:${{X_{l}(k)} = {\sum\limits_{n = 0}^{N - 1}{{w(n)}{x\left( {n + {lH}} \right)}{\mathbb{e}}^{{- j}\quad w_{k}n}}}},{1 = 0},1,{2\ldots}$where, x(n) is (a frame of) the original audio signal, w(n) the analysiswindow, wk is the frequency of the k^(th) bin (2πk/N) in radians, N thelength of the frame in samples, l the number of the frame and H the timeadvance of the window.

In the following literature peak-picking methods are described: X.Serra, “A system for sound analysis/transformation/synthesis based on adeterministic plus stochastic decomposition”, Ph.D. Dissertation,Stanford University, 1990,

-   X. Serra, J. O. Smith, “A system for Sound    Analysis/Transformation/Synthesis based on a Deterministic plus    Stochastic Decomposition”, SIGNAL PROCESSING V. Theories and    Applications, 1990,-   M. Goodwin, “ADAPTIVE SIGNAL MODELS. Theory, Algorithms and Audio    Applications”, Kluwer Academic Publishers, 1998,-   M. Goodwin, “Residual modelling in music analysis-synthesis”, in    Proc. IEEE Int. Conf. on Acoustics, Speech, and signal Processing,    1996, pp. 1005-1008,-   X. Rodet, “Musical Sound Signal Analysis/Synthesis:    Sinusoidal+Residual and Elementary Waveform Models”, Proc. of 2^(nd)    IEEE symp. on applications of time-frequency and time-scale    methods, 1997. pp. 111-120,-   X. Rodet, “Musical Sound Signal Analysis/Synthesis:    Sinusoidal+Residual and Elementary Waveform Models”, Proc. of 2^(nd)    IEEE symp. on applications of time-frequency and time-scale    methods, 1997. pp. 111-120 and G. Peeters, X. Rodet, “Signal    Characterization in terms of Sinusoidal and Non-Sinusoidal    Components”, Digital Audio Effects, 1998. B. Doval, X. Rodet,    “Fundamental frequency estimation and tracking using maximum    likelihood”, in Proc. of ICASSP '93, 1993, pp. 221-224.

Another useful determination technique is psychoacoustical matchingpursuit presented in R. Heusdens and S. van de Par (2001),“Rate-distortion optimal sinusoidal modelling of audio and speech usingpsychoacoustical matching pursuits,” in Proc. IEEE Int. Conf. Acoust.,Speech and Signal Process., Orlando, USA, May 13-17, 2002. This methoditeratively determines that sinusoidal components that is perceptuallymost relevant.

In step 200, an importance measure may be determined for the first soundframe. The first sound frame is an input to this method, and—as will befurther discussed at the end of the method—the method may be applied forsound frames comprising a song or another logically tied together soundcontent. The importance measure is generally used to make a decisionwhether a subsequently determined remaining signal or residual, i.e. thefirst sound frame without eventually determined sinusoidalcomponent(s)—and extracted sinusoidal components in the next steps—doesnot contain important tonal components or whether there are probablystill some important tonal (sinusoidal) components (in said first soundframe) left. In the first case, the method must be stopped, or in thesecond case the method may be continued.

It is important to note that the first sound frame currently—duringiteration of step 100 and 300, especially—may comprise fewer sinusoidalcomponents, since each time in step 100 a sinusoidal component isdetermined, and subsequently it is removed in step 300 (from the firstsound frame).

Said importance measure may be based on auditory perception, i.e., thehuman response to sound. A possible implementation of such a measure isa psychoacoustic energy level measure that comprises at least one of:${detectability},{D_{m} = {{\sum\limits_{f}{{R_{m}(f)}{a(f)}}} = {\sum\limits_{f}\frac{R_{m}(f)}{{msk}(f)}}}}$$\begin{matrix}{{{reduction}_{Dm}(m)} = {100 - {\frac{100*D_{m}}{D_{m - 1}}(\%)}}} \\{= {100\left( {1 - \frac{D_{m}}{D_{m - 1}}} \right)}} \\{= {100\left( \frac{\Delta\quad D}{D_{m - 1}} \right)}}\end{matrix}$

R_(m)(f) is a power spectrum of the first sound frame with possiblyremoved component(s). a(f) is the inverse function of msk(f), a maskingthreshold of the first sound frame, but not having component(s) removedfrom itself, computed in power; f is the frequency bins, m is a currentiteration number representing how many times this step and thesubsequent steps 300 and 400 are currently performed, m is set to 0 atthe start of the iteration(s), and ΔD is the increment of saiddetectability. Said msk(f), the masking threshold of the first soundframe may be computed prior to the method start, since it considers saidfirst sound frame at a starting point, i.e. at a point where nocomponents are removed from it. Conversely, R_(m)(f), the power spectrumof the first sound frame may lack component(s), since they may beremoved during the subsequent step 300; and is currently computed duringthe method execution, which thereby reflects the current psychoacousticenergy level in the previously mentioned residual.

As an alternative to said perception measure, other more advancedperception measures may alternatively be considered. These advancedperception measures could, for example, take into account temporalcharacteristics of sound. In addition, importance measures withoutconsidering auditory perception are useful.

In step 300, the sinusoidal component may be extracted from the firstsound frame, and incorporated into the second sound frame. Severalimplementations are possible here. In one embodiment, said sinusoidalcomponent is simply extracted from the first sound frame only by meansof its parameters (e.g. amplitude, phase, etc), i.e. it is notphysically removed, however the method needs in this case to keep trackof (e.g. by tagging, a note, etc.) that it (sinusoidal component) wasactually extracted in order to avoid extracting the exact samesinusoidal component in the subsequent iteration.

Alternatively or conversely, in the optional step 600 as claimed in“removing (600) the sinusoidal component from the first sound frame”;said sinusoidal component is removed from the first sound frame, i.e. itis in fact physically removed, this however requires more processingpower.

In any of these cases, said second sound frame will currentlyincorporate the extracted sinusoidal component(s). For this reason, itonly comprises sinusoidal components.

Said importance measure may fulfil said stop criterion when saiddetectability is equal to or lower than one. Alternatively, saidimportance measure may fulfil said stop criterion when said reduction islower than a predetermined value.

It may be considered during the method execution to switch between fromthe detectability to the reduction criterion, etc. and vice versa.

In step 400, it may be decided to repeat said steps (100-300) withoptionally said step 600 (of actually removing the sinusoidal componentfrom said first sound frame) until the importance measure fulfils saidstop criterion. It may be the case that the first sound frame stillcomprises more sinusoidal components, by an iteration of steps(100-300), (with m as the current iteration number representing how manytimes this step and the subsequent steps 200 and 300 are currentlyperformed), a new sinusoidal non extracted component may be found ineach run through. Consequently, the first sound frame, each time is leftwith an extracted component less. Optionally as step 600—the first soundframe, each time is left with a physically sinusoidal component less.Further, it will correspondingly affect said importance measure,especially when—as the optionally mentioned step 600—the sinusoidalcomponent is physically removed from said first sound frame

It is worth noting that step 200 of determining an importance measurefor the first sound frame may be executed before step 300, or may beexecuted between step 300 and 400. It is possible since step 200 can becomputed independently.

In step 500, as an optional step, the third sound frame may be set tothe first sound frame, when the importance measure fulfils one ofpreviously mentioned stop criterions. The first sound frame at thispoint only comprises non-important components, since the importantsinusoidal components were removed in steps 100-400. In other words, thefirst sound frame at this point comprises residuals representingprimarily non-tonal components or tonal components that are assumed tobe unimportant. In other words, said third sound frame—as a copy of theremaining first sound frame—may here be understood as the previouslymentioned residual or remaining part or signal when all importantcomponents, i.e. e.g. peaks, etc—as discussed in step 300—are physicallyextracted or at least are having a note or tagging indicating that they(important components) do not belong to said third sound frame.

The steps discussed so far can be summarized as in the following:

In the first iteration step, i.e. in step 100, the input frame, i.e. thefirst sound frame, is put into the method. Then,—a sinusoidal componentis determined (according to some criterion, for example, the energymaximum) and extracted from this frame, i.e. still the first sound frameis only considered at this point. This results in a residual signal (theoriginal input frame minus this component). Then, the importance, i.e.said importance measure, of the first sound frame (without eventuallyextracted sinusoidal component) is determined. If the importance is highenough, i.e. by means of said importance measure, it is not time forstopping now, and another iteration step will be made. The sinusoidalcomponent will be added—in step 300—(i.e. extracted and moved) to saidsecond sound frame. If the importance is not high enough the method willstop. In the next iteration step, the residual (still the first soundframe, but some sinusoidal components may be extracted from it) is putinto the method. Again, a sinusoidal component—among non extractedcomponents is determined and extracted. Its importance is determined (bymeans of said importance measure (on the first sound frame (withouteventually extracted sinusoidal component)). If its importance, i.e. oneof said importance measures, is high enough, the method will repeat,etc., corresponding to what is expressed in step 400.

So, the first sound frame is equal to the input frame in the firstiteration step, and equal to the input frame minus the already extractedcomponents—as a residual—in the other iterations steps. In eachiteration step, a new sinusoidal component is extracted. The result is anew residual. This new residual is the third sound frame correspondingto what is optionally executed in step 500. This new residual or thethird sound frame is the difference between said first sound frame andthe newly extracted sinusoidal component(s), when the method hasfinalized its task.

The second sound frame is the sum of components that are extracted sofar. It therefore represents the sinusoids.

The step 200 where the importance measure was determined, etc may beexecuted before step 300, or between step 300 and 400.

The steps 100-400 may further be performed for one or more sound frames,i.e. for a new set of said first, second and third sound frames, a newiteration number, etc., are correspondingly applied for each of saidsound frames. Correspondingly, the optional steps 500 and 600 mayfurther be applied. E.g. a song may be sub-divided in a number offrames, and by application of the steps 100-500, etc, each of theseframes, each initially considered as a first sound frame, will beseparated into a corresponding second sound frame representingsinusoidals or tonal components and a corresponding optionally thirdsound frame representing a residual.

As a consequence, the song will be separated into frames of sinusoidalsor tonal components and the residual, respectively. They are then readyto be used in a subsequent compression of the separated frames. Hereby,an optimal and efficient compression or coding of said song (separatedin said parts) may then be achieved.

Usually, the method will start all over again as long as the arrangementis powered. Otherwise, the method may terminate in step 400 (oroptionally in step 500 or 600); however, when the arrangement is poweredagain, etc, the method may proceed from step 100.

FIG. 4 shows an arrangement for sound processing. The arrangement may beused to perform the methods discussed in the foregoing figures.

The arrangement is shown by reference numeral 410 and may comprise aninput for a sound signal, reference numeral 10, e.g. as said first soundframe. Correspondingly it may further comprise outputs, referencenumerals 20 and 30, for the separated said first sound frame into saidsecond and third sound frames. All of said sound frames may be connectedto a processor, reference numeral 401. In a typical application, theprocessor may perform the separation (into sound signals) as discussedin the foregoing figures.

Said sound signal(s) may designate human speech, audio, music, tonal andnon-tonal components, or coloured and non-coloured noise in anycombination during the processing of them.

The arrangement may be cascade coupled to like or similar arrangementsfor serial coupling of sound signals. Additionally, or alternativelyarrangements may be parallel coupled for parallel processing of soundsignals.

A computer readable medium may be magnetic tape, optical disc, digitalvideo disk (DVD), compact disc (CD record-able or CD write-able),mini-disc, hard disk, floppy disk, smart card, PCMCIA card, etc.

In the claims, any reference signs placed between parentheses shall notbe constructed as limiting the claim. The word “comprising” does notexclude the presence of elements or steps other than those listed in aclaim. The word “a” or “an” preceding an element does not exclude thepresence of a plurality of such elements.

The invention can be implemented by means of hardware comprising severaldistinct elements, and by means of a suitably programmed computer. Inthe device claim enumerating several means, several of these means canbe embodied by one and the same item of hardware. The mere fact thatcertain measures are recited in mutually different dependent claims doesnot indicate that a combination of these measures cannot be used toadvantage.

1. A method of determining a second sound frame representing sinusoidalcomponents and an optionally third sound frame representing a residualfrom a provided first sound frame, the method comprising the steps of:determining a sinusoidal component in the first sound frame among nonextracted components; determining an importance measure for the firstsound frame; extracting the sinusoidal component from the first soundframe, and incorporating the sinusoidal component in the second soundframe; and repeating said steps until the importance measure fulfils astop criterion; wherein the step of determining an importance measurefor the first sound frame is executed before step 300, or is executedbetween step 300 and
 400. 2. A method according to claim 1,characterized in that the method further comprises the step of: settingthe third sound frame to the first sound frame, when the importancemeasure fulfils said stop criterion.
 3. A method according to claim 1,characterized in that the step of extracting the sinusoidal componentfrom the first sound frame, and incorporating the sinusoidal componentin the second sound frame further comprises the step of: removing thesinusoidal component from the first sound frame.
 4. A method accordingto claim 1, characterized in that the importance measure is an energymeasure.
 5. A method according to claim 1, characterized in that theimportance measure takes into account psycho-acoustical information,such as a human response to sound.
 6. A method according to claim 1,characterized in that importance measure fulfils said stop criterionwhen a perception measure considers the first sound frame as beingunimportant, and wherein said perception measure represents an ear'sperception of sound.
 7. A method according to claim 1, characterized inthat the importance measure is a psychoacoustic energy level measurecomprising at least one of:${detectability},{D_{m} = {{\sum\limits_{f}{{R_{m}(f)}{a(f)}}} = {\sum\limits_{f}\frac{R_{m}(f)}{{msk}(f)}}}},\begin{matrix}{{{reduction}_{Dm}(m)} = {100 - {\frac{100*D_{m}}{D_{m - 1}}(\%)}}} \\{= {100\left( {1 - \frac{D_{m}}{D_{m - 1}}} \right)}} \\{= {100\left( \frac{\Delta\quad D}{D_{m - 1}} \right)}}\end{matrix}$ wherein R_(m)(f) is a power spectrum of the first soundframe with possibly removed component(s), a(f) is the inverse functionof msk(f), a masking threshold of the first sound frame computed inpower, f the frequency bins, m is a current iteration numberrepresenting how many times the steps 100-300 are currently performed, mis set to 0 at start of the iterations, and ΔD is the increment of saiddetectability.
 8. A method according to claim 1 characterized in thatimportance measure fulfils said stop criterion when said detectabilityis equal to or lower than one.
 9. A method according to claim 1characterized in that importance measure fulfils said stop criterionwhen said reduction is lower than a predetermined value.
 10. A methodaccording to claim 1, characterized in that said steps with optionallysteps 500 and 600 are further performed for at least one more soundframe, wherein a new set of said first, second and third sound frames iscorrespondingly applied and generated.
 11. A computer system forperforming the method according to claim
 1. 12. A computer programproduct comprising program code means stored on a computer readablemedium for performing the method of claim 1 when the computer program isrun on a computer.
 13. An arrangement comprising means for carrying outthe steps of said method.